home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Mac Mania 6
/
MacMania 6.toast
/
/
Tools&Utilities
/
EnterAct Stuff
/
Documentation
/
hAWK_notes.h
< prev
next >
Wrap
Text File
|
1992-11-22
|
45KB
|
940 lines
/* Add this file to an EnterAct project (and update the dictionary)
for a bit of help with hAWK programming.
*/
/* This help is general, and defines the following terms:
term what’s in the note
---- --------------------
variables hAWK’s built–in variables
array general discussion
in the “in” operator
BEGIN about BEGIN blocks
END ditto END blocks
regular a discussion of regular expressions,
with examples
patterns general discussion
operators table, precedence and definition
numeric (functions) table
string (functions) table
control for while do etc
function how to define one, etc
print
printf
redirect redirecting input and output
To look up one of the above defined terms, type its name (or click at the end
of the name) and press the <Enter> key. If you have the AutoLook window open,
the definition will appear there if you type the name or double–click
on it (or click at the end of it, or paste it...)
*/
struct variables
{/*
Built–in variables
hAWK's built-in variables are:
ARGC the number of input files plus one
ARGV array of command line arguments. The array is indexed from
0 to ARGC - 1, the input file names being ARGV[1] through
ARGV[ARGC-1]. Dynamically changing the contents of ARGV
can control the files used for data.
FILENAME the name of the current input file. If no files are
specified on the command line, the value of FILENAME is
"-". A hAWK program may do all of its work in a BEGIN
block, with no need for input (generating a list of random
numbers for example).
FNR the input record number in the current input file. Reset
to 1 when starting a new input file. Hence the pattern
“FNR == 1” detects the start of each file.
FS the input field separator, a blank by default. If the
default FS is used then leading blanks and tabs are
trimmed from $1.
IGNORECASE controls the case-sensitivity of all regular expression
operations. If IGNORECASE has a non-zero value, then
pattern matching in rules, field splitting with FS ,
regular expression matching with ~ and !~ , and the gsub()
, index() , match() , split() , and sub() pre-defined
functions will all ignore case when doing regular
expression operations. Thus, if IGNORECASE is not equal to
zero, /aB/ matches all of the strings "ab", "aB", "Ab",
and "AB". The initial value of IGNORECASE is zero, so all
regular expression operations are normally case-sensitive.
NF the number of fields in the current input record.
NR the total number of input records seen so far.
OFMT the output format for numbers, %.6g by default.
OFS the output field separator, a blank by default.
ORS the output record separator, by default a newline.
RS the input record separator, by default a newline. RS is
exceptional in that only the first character of its string
value is used for separating records. If RS is set to the
null string, then records are separated by blank lines.
When RS is set to the null string, then the newline
character always acts as a field separator, in addition to
whatever value FS may have.
RSTART the index of the first character matched by match(); 0 if no match.
RLENGTH the length of the string matched by match(); -1 if no match.
SUBSEP the character used to separate multiple subscripts in array
elements, by default "\034", some kinda up arrow very rare in text.
(and three added for the Macintosh version)
RUNERR short for "run error", a file name that you can use to print
your own error messages to, as in
print "Error during run" > RUNERR.
Default name is $tempRunErr, and you'll find the file
in the same folder as $tempStdOut.
STDPATH path name that can be prefixed to any file name you wish to be
written to the same folder as stdout ($tempStdOut). Typically
looks like
"Disk:folder1...:folderN:" and typical use looks like
outname = "MyOutFile"
fullOutName = STDPATH outname;
print "something" > fullOutName;
TIME at start of run, eg "Sunday, October 13, 1991 07:58 AM"
*/};
struct array
{/*
Arrays are subscripted with an expression between square brackets,
arr"["expr"]". Array values can be numbers or strings, but the index is
always interpreted as a string. For example, when you write
arr[1]
the 1 is converted to the string "1" for use as the array index, so arr[1]
is the same as arr["1"]. This sort of array is called “associative” since
it can associate one string of text with any other, eg
arr["John Henry"] = "was a log-drivin man"
If the index expression is an expression list ( expr1 ", " expr " ...)"
then the array subscript is a string consisting of the concatenation of
the (string) value of each expression, separated by the value of the
SUBSEP variable, which is by default “\034” (decimal 28, an up arrow).
This facility is used to simulate multiply–dimensioned arrays. For
example:
i = "A" ; j = "B" ;k = "C"
x[i, j, k] = "hello, world"
assigns the string "hello, world" to the element of the array x
which is indexed by the string "A\034B\034C".
*/};
struct in
{/*
The special operator "in" may be used in an if statement to see if an
array has an index consisting of a particular value:
if (val in array)
print array[val]
If the array has multiple subscripts i j k, use
if ((i, j,k) in array) instead . The alternate
if (array[val] != "")
actually creates the array array[val] element if it does not exist, so
using “in” is usually better.
The "in" construct may also be used in a for loop to iterate over all the
elements of an array:
for (i in arr)
delete arr[i]
An element may be deleted from an array using the delete statement. New
elements should not be added to an array while looping over it with the
"in" for-loop, since hAWK isn’t quite smart enough to handle that very
well.
*/};
struct BEGIN
{/*
BEGIN and END are two special kinds of patterns which are not tested
against the input. The action parts of all BEGIN patterns are merged as if
all the statements had been written in a single BEGIN block. They are
executed before any of the input is read. Similarly, all the END blocks
are merged, and executed when all the input is exhausted (or when an exit
statement is executed). BEGIN and END patterns cannot be combined with
other patterns in pattern expressions. BEGIN and END patterns cannot have
missing action parts.
BEGIN {FS = ",[ \t]*|[ \t]+"}
sets the field separator to either a comma followed by optional blanks and
tabs or one or more blanks and tabs—a common field separator in a real
database.
END blocks are often used to finish up after all the input has been seen,
as in this little program:
{out[++n] = $0}
END {for (i = n; i >= 1; --i) print out[i]}
which accumulates all input records in the array “out”, and then at the
end prints out the records in reverse order.
*/};
struct END
{/*
BEGIN and END are two special kinds of patterns which are not tested
against the input. The action parts of all BEGIN patterns are merged as if
all the statements had been written in a single BEGIN block. They are
executed before any of the input is read. Similarly, all the END blocks
are merged, and executed when all the input is exhausted (or when an exit
statement is executed). BEGIN and END patterns cannot be combined with
other patterns in pattern expressions. BEGIN and END patterns cannot have
missing action parts.
BEGIN {FS = ",[ \t]*|[ \t]+"}
sets the field separator to either a comma followed by optional blanks and
tabs or one or more blanks and tabs—a common field separator in a real
database.
END blocks are often used to finish up after all the input has been seen,
as in this little program:
{out[++n] = $0}
END {for (i = n; i >= 1; --i) print out[i]}
which accumulates all input records in the array “out”, and then at the
end prints out the records in reverse order.
*/};
struct regular /*expression */
{/*
A regular expression is nothing more than a string of text with optional
special “metacharacters”, and in most cases the string to be used can
result from the evaluation of a variable, or the concatenation of several
strings or variables. This means you can build the regular expressions for
your program during the execution of your program, modifying them on the
fly to suit changing circumstances.
Parts of a regular expression can be grouped (with ordinary parentheses),
and later in the regular expression or in a replacement string can be
referred to by the group “tags” \1, \2, ... \9 where \1 refers to the
group started by the first left parenthesis, \2 to the second, etc. These
allow you to match a small pattern within the context of a larger one,
detect duplicate expressions, change the order of the groups and so on.
Note that parentheses have the highest precedence of all regular
expression “operators”, so they serve two purposes; changing the order in
which the metacharacters apply, and marking the boundaries of a group, for
later reference via \1..\9. More on this in a bit.
Regular expressions are built from ordinary characters, the escape
sequences
\t \n \b \B \w \W \< \> \1 \2 \3 \4 \5 \6 \7 \8 \9
and from the metacharacters
\ ^ $ . [ ] | ( ) * + ?
which are the ones with the special powers mentioned above. As you saw in
the above section, if a regular expression contains no metacharacters then
it behaves like an ordinary “find” string in that each character in the
regular expression must match a character in the string being searched.
The following table summarizes all character usage in a regular expression
(where a b c are ordinary characters, m is a metacharacter, r is a regular
expression, and d is a digit):
c matches the non-metacharacter c itself
\m matches the literal character m, eg \$ matches the dollar sign.
. matches any single character except newline.
^ matches the beginning of a line or a string.
$ matches the end of a line or a string.
[ abc... ] character class, matches any one of the characters a or b or c etc... .
[^ abc... ] negated character class, matches any character except abc...
and newline. (Ranges of characters may be abbreviated in
character classes, as in [0-9] which matches any digit,
[A-Za-z] which matches any letter, [^0-9] which matches
anything but a digit)
\w matches a “word” character, exactly equivalent to [0-9A-Za-z_]
\W matches a non-word character, ie [^0-9A-Za-z_]
\< matches the beginning of a word.
\> matches the end of a word.
\b matches the beginning or end of a word (a word boundary).
\B matches the boundary (beginning or end) of a set
of non-word characters.
\t matches a tab.
\n matches a newline (the Return key).
r1 | r2 alternation: matches either r1 or r2, eg "blue|green"
r1r2 concatenation: matches r1 followed by r2 .
r + matches one or more r 's.
r * matches zero or more r 's. (Note that zero r’s can
be anywhere in the text)
r ? matches zero or one r 's.
( r ) grouping: matches r. Parentheses have two distinct uses;
to override default precedence of metacharacter operators, and
to tag a subexpression for subsequent reference.
\1...\9 stand for whatever text the first through ninth set of
parentheses currently match, counting opening parentheses from
left to right. Note that if the pair of parentheses has a + or
* or ? operator after it, then all of the matches are
included, eg /(foo)+bar/ applied to "foofoofoobar" will set \1
to "foofoofoo". To get just the first foo, use /(foo)\1*bar/ -
then \1 is set to "foo". (Perl users note this is the opposite
of what you are used to).
\ddd is interpreted as an octal number, as in C. The digits
exclude 8 and 9, needless to say, and there can be from 1 to 3
digits in the number. Note that \1 through \7 are interpreted
as subexpression tags unless followed immediately by another
octal digit (eg \23 is not tag 2 followed by a 3, it is the
octal number 19 decimal). \8 and \9 are always tags, since 8
and 9 are not octal numbers. To refer to octal numbers 1 to 7,
use \01 to \07. To follow a tag with a low number (eg \2
followed by 3), use the octal representation of the number (eg
\2\063 -- \063 equals 51 decimal, the ASCII code for 3).
The metacharacters ^ and $ to match the beginning and end of strings, and
\b \B \< \> to match various boundaries don’t actually match any
characters; rather they force alignment to a particular text position. For
example, /\brun\b/ will always match just “run” if it matches anything,
but will not match "runner" or "brunt". By comparison, /\Wrun\W/ won’t
match “runner” or “brunt” either, but it will include any non–word
character that happens to come before or after the word “run”. Normally
you won’t want to include leading or trailing spaces etc in the match.
Parentheses () have the highest precedence, allowing you to override
default precedence when needed. The “repetition” operators * + ? have the
next–highest precedence, followed by concatenation, with alternation
having the lowest precedence of all. For example, in abc*d the * applies
only to the c since the repetition operator acts before concatenation, and
in abd|def the | applies to abd and def since concatenation binds them
together into little groups of three before alternation can play.
Regular expression can be used to just locate an instance of a
pattern, as in
$0 ~ /extern/
but they can also be used to specify text for replacement, by using the
“sub” and “gsub” functions. Looking ahead just a bit, these functions take
a regular expression as the first argument, the string to use for
replacement as the second argument, and the string to do the search and
replace in as the third argument, with $0 used by default if there is no
third argument. “sub” does a single substitution on the text, and “gsub”
does all possible non-overlapping substitutions. Within the replacement
strings of these functions, you can use \1 through \9 to refer to text
currently matched by tagged subexpressions, and the ampersand “&” stands
for all of the text that was matched. To put a plain ampersand in the
replacement, use “\&”.
At this point some considerable exampling usually helps:
The quick brown matches just that, "The quick brown". Note it
would match "The quick brown" in "The quick brownie".
red fox\. matches "red fox." (the period must be
escaped for a literal match).
[ \t] matches a single space or tab
(that’s a space before the \).
[ \t]+ matches any consecutive run of spaces and tabs
in any mix.
[0-9]+ matches an integer (read “one or more digits”)
[+-]?[0-9]+ matches an integer, together with any preceding sign.
[A-Za-z]+ matches an English word (unhyphenated).
houses? matches "house" or "houses".
m(iss)*ippi matches "mippi", "missippi", "mississippi",
"missississippi", etc.
ar*g matches "ag", "arg", "arrg", "arrrg", etc.
MyFunction\( matches "MyFunction(".
array\[index\] matches "array[index]".
array\[.+\] matches "array[i]", "array[j]", "array[2*q-1]", etc.
\\([0-7]|[0-7][0-7]) matches "\d" or "\dd" where d is an octal digit.
([^\\]?|(\\\\)+)" (horrors, be brave) matches an unescaped quote or
a quote preceded by an even number of
backslashes—in other words a true quote in C. The
backslash is a metacharacter, so matching one
literally requires a backslash before the
backslash.
The[ \t]+quick[ \t]+brown matches "The quick brown" with variable
spaces and tabs between the words.
\/\* matches the start of a C comment, "/ *". The
forward slash is escaped so that you can place the
whole regular expression inside forward slashes.
The escape before '/' would not be needed if you
placed the expression inside quotes, but then you
would need two escapes before the '*', ie "/\\*".
\/\*.*\*\/ matches all of a one–line C comment,
"/ * - anything - * /".
^Z matches a 'Z' at the beginning of a string.
^. matches the first character of a string.
.$ matches the last character of a string.
^.*$ matches any string completely (and is therefore useless).
^A..$ matches any string which is three characters long,
the first being an 'A'.
^(A|B).* matches all of any string that begins with 'A' or 'B'.
^[AB].* does likewise.
\w+ matches a C term, or integer constant.
((->)|(\.))(mem\b) matches “mem” when it is immediately preceded by “->”
or “.”, and is not the beginning of a longer word.
For replacement purposes in a “sub” or “gsub”, the
part before “mem” is given by \1, and mem itself
is \4.
gsub(/((->)|(\.))(mem\b)/, "\1\4ber") will turn “->mem” into “->member”
and “.mem” into “.member” everywhere in the
current input line $0, ignoring things like
“remember” or “->memories”.
gsub(/\bFuncName([ \t]*\()/, "FunctionName\1") will replace “FuncName” by
“FunctionName” everywhere in the current input
line $0, provided it is followed on the same line
by an opening parenthesis, with optional spaces or
tabs between the name and “(”. The match extends
from the “F” of “FuncName” up to and including the
“(”, so the “(” and any intervening white space
must be put back into the replacement string by
tagging them in parentheses and using \1 after
“FuncName” to refer to what was matched by the
first set of parentheses in the pattern.
Within a character class most metacharacters are taken literally. The
exceptions are the escaping backslash \, the negating ^ (only at the
beginning), and the range hyphen - (only between two characters). For
example,
[A-Za-z-] matches an English word, hyphens included
[-A-Za-z] does the same
[\-A-Za-z] also does the same (the '\' is unnecessary but harmless)
^[^^] matches any single character that is not a '^' at
the beginning of a string
[\^] matches a '^'.
The toughest metacharacter to remember is the '^' which has three
meanings: at the beginning of a character class it signals a negated
character class; outside of a character class it matches the beginning of
a string; and when escaped or not the first character in a character class
it matches a literal '^'.
Regular expressions are “left greedy”; where there could be more than one
match in a string, a regular expression matches the leftmost one, and
extends the match as far as possible.
Now that we’re starting to get the hang of things, more examples using the
replacement functions “sub” and “gsub” mentioned above. The format is
sub(r,s,t) where r is a regular expression, s is the replacement string,
and t is the string in which the search and replace is to be done. The
contents of t before and after the sub are spelled out below.
using t = "Don’t run that prune over, runt!":
sub(/run/, "fly", t) turns t into "Don’t fly that prune over, runt!"
gsub(/run/, "fly", t) turns t into "Don’t fly that pflye over, flyt!"
gsub(/\brun\b/, "fly", t) turns t into "Don’t fly that prune over, runt!"
gsub(/run/, "t&k", t) turns t into "Don’t trunk that ptrunke over, trunkt!"
using t = "#define FOO 1":
sub(/#define\W+(\w+)\W+([0-9]+)/, "int \1 = \2;",t) turns t into
"int FOO = 1;" (\W+ means one or more non-word characters, \w+
means one or more word characters, [0-9]+ means one or more digits;
two groups are tagged).
Three programs are supplied to help you do general–purpose listing of
matches or search–and–replace; $MFSLister searches for either plain text
or a regular expression with “Set variables” in the setup dialog, and
lists file name/ line number of all single–line matches to stdout;
$MFS_SuperLister does much the same, but finds matches that span a
variable number of lines; and $MFS_SuperReplace does the ultimate search
and replace, matching either plain text or full–blown regular expressions
over a variable number of lines, handling any number of files at once,
documenting the (post–change) locations of all changes to stdout. Heck, it
even prints the fragments of original text before the changes, so that if
you mess up you can at least (manually) undo the damage.
*/};
struct patterns
{/*
Summary of patterns
A list of beasts in the pattern zoo (regex stands for regular expression,
pat stands for pattern, str stands for string variable):
Pattern Example
---------------- -------------------------------
BEGIN BEGIN blocks are done before all input
END END blocks are done after all input
/regex/ /Mary( \t)+had/
str ~ /regex/ (or !~) $1 ~ /(\-)?[0-9]+/
str ~ "regex" (or !~) $1 ~ "(\\-)?[0-9]+"
relational expression NF > 4
pattern && pattern FNR == 1 && /File title:/
pattern || pattern /Vermont/ || /Maine/
pattern ? pattern : pattern $3 != 0 ? $2 / $3 > 25 : $2 < 0
( pattern ) - see next line
! pattern !($0 == "" || $0 ~/^The end$/)
pattern1 , pattern2 FNR == 5, FNR == 8
*/};
struct operators
{/*
The operators in hAWK, in order of increasing precedence, are:
-------------------------------------------------------------
= += -= *= /= %= ^=
Assignment. Both absolute assignment ( var " = " value ) and
operator-assignment (the other forms) are supported. “a += b” is
equivalent to “a = a + b”.
?: The C conditional expression. This has the form
expr1 " ? " expr2 " : " expr3
If expr1 is true, the value of the expression is expr2 , otherwise it is
expr3 . Only one of expr2 and expr3 is evaluated.
|| logical OR. In “a || b” if a is true then b is not evaluated.
&& logical AND. In “a && b” if a is false then b is not evaluated.
~ !~ regular expression match, negated match. See “String-matching patterns”.
< <= > >= != ==
the regular relational operators. Note especially that strings
can be compared, eg if ($3 == "cat"). In “a <= b” or the like,
if both arguments are numbers the comparison is done
numerically, otherwise they are compared as ASCII strings.
blank string concatenation; if a = "John" and b = "Henry" then
c = a b; produces c = "JohnHenry".
+ - addition and subtraction.
* / % multiplication, division, and modulus (x%y produces the
remainder of x divided by y, equivalent to x - int(x/y)*y).
+ - ! unary plus, unary minus, and logical negation.
^ exponentiation.
++ -- increment and decrement, both prefix and postfix.
$ field reference. $0 is the entire current record, $1 the first
field, and $NF the last field. Fields may be changed or added.
*/};
struct numeric /*functions */
{/*
Built–in numeric functions
hAWK has the following pre-defined arithmetic functions, with x and y as
arbitrary expressions:
atan2( y , x ) returns the arctangent of y/x in radians.
cos( x ) returns the cosine of x in radians.
exp( x ) the exponential function "e to the x"
int( x ) truncates to integer (eg int(7.325) gives 7); to round,
use int(x + .5).
log( x ) the natural logarithm function, base e. For log base 10, use
log(x)/log(10).
rand() returns a random number, 0 <= rand() < 1.
sin( x ) returns the sine of x in radians.
sqrt( x ) the square root function.
srand( x ) use x as a new seed for the random number generator.
If no x is provided, the time of day will be used. The
return value is the previous seed for the random
number generator.
*/};
struct string /*functions*/
{/*
Built–in string functions
There is only one string operator, the concatenation operator, invoked
when two variables or constants are separated by a space. Other useful
string manuipulations in hAWK are carried out by built–in functions. In
the following table, r is a regular expression, s and t are strings, the a
and b are arrays, and i and n are integers.
gsub(r, s, t) for each substring matching the regular expression r
in the string t , substitutes the string s , and
returns the number of substitutions. If t is not
supplied, uses $0 .
index( s , t ) returns the index of the string t in the string s,
or 0 if t is not present.
length( s ) returns the length of the string s .
match( s , r ) returns the position in s where the regular expression r
occurs, or 0 if r is not present, and sets the values of
RSTART and RLENGTH .
split(s, a, r) splits the string s into the array a on the regular
expression r , and returns the number of fields. If r is
omitted, FS is used instead.
sprintf( fmt , expr-list ) prints expr-list according to fmt , and returns the
resulting string. See the discussion of “printf” for details.
sub(r, s,t) this is just like gsub , but only the leftmost matching
substring is replaced. Returns number of substitutions.
substr(s, i, n) returns the n-character substring of s starting at i . If n
is omitted, the rest of s is used.
tolower( s ) returns a copy of the string s , with all the upper-case
characters in s translated to their corresponding
lower-case counterparts. Non-alphabetic characters are
left unchanged.
toupper( s ) returns a copy of the string s , with all the lower-case
characters in s translated to their corresponding
upper-case counterparts. Non-alphabetic characters are
left unchanged.
lookup ( s ) returns integer–coded C type of s (s should be a word).
At present this function is supported only by EnterAct.
Types are taken from whatever EnterAct project is open
at the time. See “$LookupTest” for an example.
Type integer returned
---- ------------
defined constant or macro 1
file–scope variable 2
function 4
enum constant 8
typedef 16
struct tag 32
union tag 64
enum tag 128
other 0
sort(a,b,s) produces an index in the array “b” that can be used to
access the elements of “a” in sorted order. The string
“s” specifies the kind of sort; "a" for ASCII, "n" for
numeric, "d" for dictionary order, and "ra", "rn",
"rd" for reverse of the same. Returns the number of
elements in the array “b”, which is indexed
numerically from 1 upwards. The elements of “b” are
the indexes of “a” in sorted order provided “b” is
accessed in the sequence b[1], b[2], b[3] etc. Typical
use is
maxIndex = sort(a, b, "d")
for (i = 1; i <= maxIndex; ++i)
print a[b[i]]
which will print the elements of a in sorted
dictionary order. See “$WordFrequency” and
“$XRef_Full” for examples, and “$SortTest_Nums” for a
simple numeric example.
time ( ) returns the current time, eg
"Sunday, October 27, 1991 09:03:30 AM"
—note this is the time when the function
is called, down to the second, whereas the TIME
variable holds the time at which your program run
starts, down to the minute. See “$TIME” for an example.
prompt ( s ) displays an OK/Cancel dialog. The string “s” appears
at the top of the dialog, and you can type in a string
in an edit field. Returns what you type in, as though
it was a string constant. Both the string “s” and what
you type in are limited to 255 characters. For an
example of usage see “$PromptTest” and “$YoungMath”.
Typical use is
x = prompt("Enter the number of lines to print:")
if (x+0 > 0) {
while (getline lne > 0 && ++i <= x) print lne }
If you cancel the dialog or hit <Return.> without
typing in any text, prompt returns the null string "".
progress (s) displays the string “s” in a dialog on your screen
(the message stays on the screen). You can change the
message with another “progress” call. “progress”
returns the number of times it has been called, and
the dialog goes away by itself at the end of your
program run. For a test sample, see “$ProgressTest”.
Within the replacement string 's' of gsub(r,s,t) and sub(r,s,t), a '&' is
taken to stand for the entire string of text that was matched by the
regular expression 'r'. For example, gsub(/cat/, "&s", t) with t = "cat
and dogs" produces t = "cats and dogs" after the substitution. Use “\&” if
you want a literal '&' in the replacement string.
--and added for hAWK version 2 (mainly file functions):
Note in the functions below where a file or directory name is required it must
be a full pathname, of the form “disk:folder1:folder2:...:folderN:filename”
for a file, or “disk:folder1:...:folderN” or “disk:folder1:..:folderN:”
for a directory (the second version has a colon at the end). For a disk name,
use “disk:” rather than “disk”.
beep( n ) does a SysBeep(n); if the duration "n" is <= 0, the menu bar will
flash instead. Durations of 0,1,2,5 work best.
copy( s, t ) copies the file named “s” to the file named “t”. Both file names
must be full pathnames (disk:folder:...folder:filename). Either
the location or name or both can be changed. If file “t” already
exists, it must be closed and unlocked. Both creator and type are
preserved, and the resource fork is copied as well as the data
fork. Any kind of file can be copied. To move or rename a file, use
if (copy(s,t)) remove(s)
(note this is an efficient way to move a file, but not a very fast
way to rename one).
Returns 1 if successful, 0 if the copy could not be done.
exists( s ) returns 1 if the file named “s” exists, 0 if it does not. Any kind
of file can be tested.
fdate( s ) returns date/time of last modification of file named “s”, format
“yr:mo:day:hr:min:sec” where yr is 4 digits, and the rest are 2
(eg always 01 rather than just 1). The length of the string is
always 19 (or 0 if no date could be extracted) and the colons
and digits always occupy the same positions.
fsize( s ) returns size in bytes of the data fork only of the file named “s”
getclip( n ) returns the calling application’s current clipboard text, up to
a maximum of the first “n” bytes. Use n = 0 or omit it entirely
if you want the entire clipboard. For example, if the current
clip is “Some text here” then getclip(6) returns “Some t”
whereas getclip(0) or getclip() returns the entire clip. At
present this function is supported by: EnterAct.
list( s, a ) given file or directory full pathname in “s”, produces list of
full pathnames for all TEXT files in the directory (either the
directory named or the directory holding the file), as elements
indexed 1,2,3... in the array “a”. Note subdirectories are also
excluded. Returns the number of files in the list.
nested( s, a ) given a file full pathname in “s”, generates list of full pathnames
for directories at the same level ("sibling folders"); given directory
name, generates list of subdirectories at the top level in the named
directory (“offspring directories”). The list is returned as elements
indexed 1,2,3... in the array “a”. In other words, the same as
“list” but for folders rather than TEXT files. Note neither “list”
nor “nested” look beneath the top level of the folder in question.
Returns the number of directories in the list.
remove( s ) deletes the file named “s”, provided it is closed and unlocked. Use
with caution, this is not undoable unless you get lucky using your
favourite file recovery tool. Returns 1 if the file was deleted,
0 otherwise.
rename( s, t ) takes the file with full pathname “s”, and renames it “t”. The
new name “t” can be a full pathname, or just the new file name
proper, as in
rename("Disk:dir1:aardvark", "Disk:dir1:fruitbat")
or equivalently
rename("Disk:dir1:aardvark", "fruitbat")
This function works only with files, not directories or volumes,
returning 1 if the rename was carried out, 0 if not.
*/};
struct control
{/*
In the following list of control statments, any instance of “statement”
can be replaced by a group of statements enclosed in curly braces {}:
{ statements }
Simple grouping of several statements together, so that conditional or
repeated execution can be applied to the group.
if (condition) statement1 [ else statement1 ]
If the condition evaluates to true then statement1 is carried out; the
“else” clause is optional, and its statements will be executed if the
condition is false.
while (condition) statement
The condition is first evaluated, and if it is false then the
statement is skipped. If it is true then the statement is executed;
the condition is again evaluated, and the statements again executed if
the condition is true, and this process continues until the condition
is false. Note that if the condition is false the first time then the
statement will not be executed at all. “while” loops are affected by
break and continue statements.
do statement while (condition)
The statement is always executed at least once; then the condition is
evaluated, and if it is true then the statement is excuted again. This
process continues until the condition is false. Unlike the “while”
loop, the “do” loop always executes its statement at least once.
for (expr1; expr2; expr3) statement
eg “for (i = 1; i <= 6; ++i) {print i}”
Mnemonically, “for it’s (a jolly good fellow)” helps: the “i” stands
for initialization, the “t” for “test”, and the “s” for “step”. expr1
is the initialization, executed only once, just before the “for” loop
proper is entered. Next expr2, the test, is evaluated, and if it is
true then the statement is executed, otherwise the for loop ends and
control passes to the next statement beyond it. If the statement is
executed then expr3, the step, is carried out, and then it’s back to
the top of the loop —no more initialization, but the sequence test,
execute, step, continues until the test produces false.
for (var in array) statement
Indexes for the array are retrieved one–by–one to the variable “var”,
though not in a readily predictable order, and the statement is
executed for each index.
break
For use only among the statements that make up the body of a while,
do, or for loop. Usually found in the form “if (condition) break;”,
when the break is executed then control immediately passes to the next
statement after the loop.
continue
Also for use only in a while, do, or for loop, and also usually
executed only when the condition of some if–statement is true. When
encountered, control passes to the very end of the statements making
up the body of the loop, and the next iteration of the loop begins.
next
Stop processing the current input record. The next input record is
read and processing starts over with the first pattern in the hAWK
program. If the end of the input data is reached, the END block(s), if
any, are executed.
exit [ expression ]
In an END action, exit truly causes the hAWK program to terminate.
Anywhere else, the exit statement causes the program to jump to the
END actions, and only if none are present does the program immediately
terminate. The “expression” is provided for compatiblilty with
standard AWK programs, and won’t be of any use to you.
*/};
struct function
{/*
Functions in hAWK take the form:
"function" name(parameter1, parameter2,... local1, local2...)
{
statements
}
They are executed when called from within an action statement.
hAWK function definitions begin with the keyword “function”, and no return
type is declared, though a value may optionally be returned. Local
variables are listed after the parameters for the function, more to
simplify the grammar of the language than anything else. Scalar parameters
are passed by value (ie a local copy is made for the function, and the
original variable in the function call is not touched by the function)
whereas array parameters are passed by reference (the parameter array name
refers to the same array that is provided as the argument). Function
definitions must be placed at the top level of your program outside any
pattern–action blocks, and you generally end up with a readable program if
you put all of your function definitions at the end of your program.
Here’s a typical function:
function Swap(a, i, j temp)
{
temp = a[i]
a[i] = a[j]
a[j] = temp
}
When called, it appears for example as
arr[1] = 7; arr[4] = 9; Swap(arr, 1, 4)
which results in arr[1] = 9, arr[4] = 7. Note that the “temp” variable is
intended for use only within the Swap function, and is a local variable
rather than a parameter of the function.
Local variables are initialized to 0/"" each time the function is called.
No space should be put between the function name and the '(' of the
argument list when calling one of your own functions, to avoid invoking
the simple–minded concatenation operator.
Functions may return an expression, as in
function SumArraySquared(a, sum)
{
for (i in a) #unlike C, array size need not be known separately
sum += a[i]#note sum is local, automatically inited to zero
return sum*sum
}
or
function StringUpTo(str, upto)
{
return substr(str, 1, index(str, upto) - 1)
}
(eg StringUpTo("This is: a test", ":") would return "This is").
Some details about functions:
Newlines are optional after the left curly brace of the function body and
before the closing left brace.
Functions may call each other and may be recursive.
The word func may be used in place of function. For tired typers only.
*/};
struct print
{/*
The “print” statement
“print” sends simply–formatted strings to a file, stdout by default. The
expressions supplied to the print statement are separated from one another
by commas, and may also be entirely surrounded by parentheses. The
variations are
print
print expression1, expression2, ..., expressionN
print (expression1, expression2, ..., expressionN)
A “print” with no expressions is an abbreviation for
print$0
Each expression is converted to a string and printed in turn, with each
comma being replaced by the built–in variable OFS, by default a single
blank. Each print statement is terminated with the built–in ORS, by
default a newline.
The parenthesized version of “print” is necessary if relational operators
are present in the expressions, since the '>' operator can mean “greater
than” or “redirect output to the file...”—see “Output into files”.
The print statement is used in virtually every sample program provided,
and the more–sophisticated “printf” is seldom seen since fancy formatting
is not often needed.
print "" #prints just a blank line
*/};
struct printf
{/*
The “printf” statement
This function also has a parenthesized and unparenthesized form,
printf format, expression1, expression2, ..., expressionN
printf(format, expression1, expression2, ..., expressionN)
and, as with “print”, the parentheses are needed only if a relational
operator is contained in one of the expressions. The “format” argument is
interpreted as a string, and may contain either literal text to be printed
or format specifications for strings or numbers to be printed. Format
specs are indicated in the format string by a '%', and there should be one
expression following the format for each format specification—eg if you
specify that a string, a number, and a string be printed, then you list
the string, number, and string after the format, in the same order,
separated by commas.
The hAWK versions of the printf and sprintf functions accept the following
conversion specification formats, entirely borrowed from C:
%c an ASCII character. If the argument used for %c is numeric, it is
treated as a character and printed. Otherwise, the argument is
assumed to be a string, and the only first character of that
string is printed.
%d a decimal number (the integer part).
%i just like %d .
%e a floating point number of the form [-]d.ddddddE[+-]dd .
%f a floating point number of the form [-]ddd.dddddd .
%g use e or f conversion, whichever is shorter, with nonsignificant zeros
suppressed.
%o an unsigned octal number (again, an integer).
%s a character string.
%x an unsigned hexadecimal number (an integer).
%X like %x , but using ABCDEF instead of abcdef .
%% a single % character; no argument is converted.
There are optional, additional parameters that may lie between the % and
the control letter (also from C):
- the expression should be left justified within its field
(note if the '-' is absent then the expression is right
justified)
width the field should be padded to this width. If the number
has a leading zero, then the field will be padded with
zeros. Otherwise it is padded with blanks.
. prec a number indicating the maximum width of strings or digits
to the right of the decimal point.
For example, %-23.14s prints strings in a field 23 characters wide, left
justified, printing at most 14 characters from the string. And %8.4f will
print a floating point number in a field 8 characters wide, right
justified, with 4 digits to the right of the decimal point.
The dynamic width and prec capabilities of the C library printf routines
are not supported. However, they may be simulated by using the hAWK
concatenation operation to build up a format specification dynamically.
Some examples:
“print var” always appends the value of ORS (by default a newline);
to avoid this, use
printf("%s ", var)
and when a newline is needed, supply one yourself with something like
print "" or printf("%s\n", var).
Given strings of variable width in fields $1 and $2, reformat to print
these strings right–justified in two nicely–lined–up columns:
{ one[++n] = $1
two[n] = $2
if (w1 < length($1))
w1 = length($1)
if (w2 < length($2))
w2 = length($2)
}
END {w1 += 2; w2 += 2;#a couple of spaces between columns
for (i = 1; i <= n; ++i)
printf "%" w1 "s" "%" w2 "s\n", one[i], two[i]
}
—this illustrates using the hAWK concatenation operation “to build up a
format specification dynamically”; for example, if w1 = 9 and w2 = 15
(after adding 2) then we get
printf "%9s%15s\n", one[i], two[i]
as the effective printf statement.
*/};
struct redirect
{/*
OUTPUT:
By default, “print” and “printf” send all of their output to stdout.
However, the redirection operators '>' and '>>' allow you to send output
to any text file. Redirecting output takes one of the forms
print expression–list > outfile
print(expression–list) > outfile
printf format, expression–list > outfile
printf(format, expression–list) > outfile
print > outfile
or any of those with '>>' instead of '>'. The '>' operator will erase the
contents of outfile before beginning to write to it, whereas '>>' will
append what is being printed to outfile without clearing the file first.
Both operators open the file “outfile” the first time it is encountered in
the program, and keep it open. The file will be closed for you at the end
of your program, but if you have many files to write to you should close
each output file yourself when you are done with it, with
“close(outfile)”.
INPUT:
“getline” is a built–in function that allows you to retrieve input records
from the current input file or from any other file. As you know, the
default behaviour of a hAWK program is to retrieve input from your input
files one record at a time, marching through the records and files from
beginning to end. Often, however, one needs to read in a group of lines
until some condition is met, or interrupt regular input to retrieve
records from some other file, and these are the special capabilities that
“getline” provides. It can be used in the following ways:
getline sets $0 from next input record; sets NF, NR, FNR .
getline < file sets $0 from next record of file; sets NF .
getline var sets var from next input record; sets NR, FNR .
getline var < file sets var from next record of file .
and in all cases “getline” returns 1 if a record was successfully
retrieved, 0 if the end of file was encountered, and -1 if some problem
occurred, such as failure to find the file.
The effect of “getline” by itself is to dump the current string in $0 and
replace it with the next input record, setting all the usual built–in
variables. Program execution then continues with the statement following
“getline”. By comparison, the “next” statement does everything that
“getline” by itself does, but in addition processing starts over with the
first pattern in your hAWK program.
If a variable name is present immediately after “getline”, then the input
record is retrieved to the variable instead of to $0. The '<' symbol is
the input redirection operator meaning “get input from the file...”, and
is followed by the name of the input file to use. Note that file names
must be full path names, as is always the case in hAWK.
*/};